-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infer type of schema from JSON and RDF mutations. #4328
Conversation
If the same subject-predicate pair appears multiple times in a mutation and the predicate needs to be created, create it as a list to avoid losing data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 4 unresolved discussions (waiting on @danielmai, @manishrjain, @martinmr, and @MichaelJCompton)
chunker/json_parser.go, line 237 at r1 (raw file):
nquads []*api.NQuad nqCh chan []*api.NQuad forcedSinglePreds map[string]bool
predHints map[string]enum
chunker/json_parser.go, line 250 at r1 (raw file):
buf.nquads = make([]*api.NQuad, 0, batchSize) } buf.forcedSinglePreds = make(map[string]bool)
can have the proto here directly. Protos have maps.
chunker/json_parser.go, line 302 at r1 (raw file):
for _, pred := range metadata.GetForcedListPreds() { if _, ok := buf.forcedSinglePreds[pred]; ok { return errors.Errorf("")
fill in the blank.
chunker/json_parser.go, line 417 at r2 (raw file):
} buf.Push(&nq) if err := buf.PushPredHint(pred, pb.ParseMetadata_SINGLE); err != nil {
Do we really need to return an error just in case our "hints" don't match up? They are meant to be hints and guesses, not stop the program.
Defaults to a list instead to avoid destroying data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 4 unresolved discussions (waiting on @danielmai, @manishrjain, and @MichaelJCompton)
chunker/json_parser.go, line 237 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
predHints map[string]enum
Done. This is a comment from the previous time you looked at the PR but didn't submit your comments.
chunker/json_parser.go, line 250 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
can have the proto here directly. Protos have maps.
Done. This is a comment from the previous time you looked at the PR but didn't submit your comments.
chunker/json_parser.go, line 302 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
fill in the blank.
Done. This is a comment from the previous time you looked at the PR but didn't submit your comments.
chunker/json_parser.go, line 417 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Do we really need to return an error just in case our "hints" don't match up? They are meant to be hints and guesses, not stop the program.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a few comments. Good to go after.
Reviewed 1 of 9 files at r1, 7 of 10 files at r2, 3 of 3 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @danielmai, @manishrjain, @martinmr, and @MichaelJCompton)
chunker/json_parser.go, line 536 at r3 (raw file):
// ParseJSON is a convenience wrapper function to get all NQuads in one call. This can however, lead // to high memory usage. So be careful using this. func ParseJSON(b []byte, op int) ([]*api.NQuad, *pb.ParseMetadata, error) {
Maybe return *pb.Mutation.
chunker/rdf_parser.go, line 58 at r3 (raw file):
// ParseRDFs is a convenience wrapper function to get all NQuads in one call. This can however, lead // to high memory usage. So, be careful using this. func ParseRDFs(b []byte) ([]*api.NQuad, *pb.ParseMetadata, error) {
This can return *pb.Mutation, I suppose.
protos/pb.proto, line 224 at r3 (raw file):
string drop_value = 8; ParseMetadata parse_metadata = 9;
Metadata metadata
protos/pb.proto, line 227 at r3 (raw file):
} message ParseMetadata {
s/ParseMetadata/Metadata
This reverts commit 2a2d428.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 11 files reviewed, 8 unresolved discussions (waiting on @danielmai, @manishrjain, and @MichaelJCompton)
chunker/json_parser.go, line 536 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Maybe return *pb.Mutation.
pb.Mutations does not have a way to store the nquads inside it so this method would still need to return two objects. There's not much I would gain from switching pb.Metadata to pb.Mutation if that's what you meant instead.
chunker/rdf_parser.go, line 58 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
This can return *pb.Mutation, I suppose.
pb.Mutations does not have a way to store the nquads inside it so this method would still need to return two objects.
protos/pb.proto, line 224 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Metadata metadata
Done.
protos/pb.proto, line 227 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
s/ParseMetadata/Metadata
Done.
If the same subject-predicate pair appears multiple times in a mutation and the predicate needs to be created, create it as a list to avoid losing data. (cherry picked from commit 4da3614)
This change changes the parser (for both RDF and JSON) to send hints about the type
of the predicates (list or non-list) so predicates that are created during mutations are
created with the appropriate type.
Fixes #3788
This change is